MERGING COSTS FOR THE ADDITIVE MARCUS LUSHNIKOV 
PROCESS, AND UNION-FIND ALGORITHMS. 
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(^ . PHILIPPE CHASSAING AND REGINE MARCHAND 

Pj ' Abstract. Starting with a monodispcrse configuration with n size— 1 parti- 

^ ' cles, an additive Marcus-Lushnikov process evolves until it reaches its final 

state (a unique particle with mass n). At each of the n — 1 steps of its evolu- 
j/~v , tion, a merging cost is incurred, that depends on the sizes of the two particles 

involved, and on an independent random factor. This paper studies the asymp- 
totic behaviour of the cumulated costs up to the fcth clustering, under various 
^^ regimes for (n, fc), with applications to the study of Union-Find algorithms. 
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1. Introduction, models and results 

Fundamental to computer science is the manipulation of dynamic sets: sets that 
can grow, shrink or otherwise change over time. Some algorithms, e.g. Kruskal or 
^L ' Prim algorithms for the search of the minimum spanning tree of a graph, involve 

Q«<^ , grouping n distincts elements into a collection of disjoint sets, and implementing 

^^ ' two operations, UNION, that unites two sets, and FIND that finds which set a 

^«0 , given element belongs to (see [5] Part III] for more) . For the analysis of the cost 

^-r ' of such operations, Yao !27| suggested two models, the spanning tree model and 

^-v , the random graph model. Both are instances of a general model of coalescence of 

particles, that we describe now. 

-(— > ■ 

1.1. Marcus— Lushnikov processes. The study of coalescence of particles (sets, 
clusters) with different sizes has a long story, and has applications in many scientific 
disciplines besides computer science, such as physical chemistry, but also astronomy, 
bubble swarms, and mathematical genetics (cf. the survey ^). In a basic model, 
clusters with different masses move through space, and when two clusters (say, with 

C^ ' masses x and y) are sufficiently close, there is some chance that they merge into 

a single cluster with mass x + y, with a probability quantified, in some sense, by 
a rate kernel K, depending on the masses, the positions and the velocities of the 
two clusters. However, such a model, including the spatial distribution of clusters 
and their velocity, is still too complicated for analysis, so a rather natural first 
approximation was suggested independently by Marcus [18J and Lushnikov 16, lf| , 
by considering kernels depending only on the masses of the clusters. 

A Marcus-Lushnikov process 1 with rate K is a continuous-time Markov process 
whose state space is the set of partitions of n or, equivalently, the set of measures 
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on the set N of positive integers 
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n{k,t) 
Ok, 



in which n(k, t) is an integer, and 

y kn(k, t) — n, 

k 

so that j x^i{dx) = 1. The fc's stand for the sizes of clusters and n{k,t) is the 
number of clusters with size k at time t. The size-fc clusters provide a fraction 
"^ ' ' of the total size n. A Marcus-Lushnikov process evolves by instantaneous 
jumps according to the rule 

each pair {x, y) of clusters merge at rate K{x, y)/n. 

In other words, the system of clusters jumps from the state /i to the state /x + 
— {5x+y — 5x — Sy) at rate K{x,y)/n, meaning that, if at time t the state of the 
system is {xi)i>i, the next pair (/, J) of clusters that merge and the time t + T 
when they merge are jointly distributed as follows: assume we are given a set of 
independent random variables (Ti j)i<i<j with exponential distribution described 

by 

P (T,j >t)= cxp {-K{x^, Xj)t/n) , 
and set 

inf T,j = Tj J = T. 

It follows, as usual for continuous time Markov chains, that Tj^j and (/, J) are 
independent, that Tjj has an exponential law with parameter J2i -j K{xi,Xj), and 
that 

(1) P((/,J) = (.,j))- ^^'''''''^ 



Ekj^i^k^xt)' 



We shall see later that the additive Marcus-Lushnikov process (with kernel 
K(x, y) = X + y) is embedded in the spanning tree model of Yao. The relation 
between the random graph model and the multiplicative Marcus-Lushnikov process 
(with kernel L'C^x, y) = xy) was noted by Knuth and Schonhage 15 and Stepanov 
|26j . In both cases, the clusters are connected components of a graph, and the 
merging of two clusters is due to the addition of an edge between elements of these 
clusters. Also, we assume that the initial state consists in n clusters with size 1; this 
state is often called the monodisperse configuration. This corresponds to a totally 
disconnected graph with n vertices and no edges. Thus there are eventually n — 1 
jumps (steps, mergings . . . ) between the initial state ^i and the final state — (5„ of 
the Marcus-Lushnikov process. In this paper, we focus on the additive case. 

1.2. Analysis of merging costs. At the fc-th jump (addition of the k-th edge) 
of the Marcus-Lushnikov process, two subsets with respective sizes {Sk,m Sk,n), 
Sk,n > Sk,m are merged, at a cost Ck,n that may depend on the sizes {Sk,n,Sk,n)- 
For instance, in some implementations, a label is maintained for each element, 
signaling the set it belongs to, and when merging two sets, one has to change the 
labels of the elements of one of the 2 sets. Yao, Knuth and Schonhage studied two 
algorithms: 
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Quick-Find, that updates the labels of one of the two sets, selected arbi- 
trarily, leading to cumulated costs 



^n,m ~ / , ^fc,7n 



k=l 

in which Ak n = Sk n with probability 1/2 and Ak n ~ Sk « with probability 
1/2, 
• and Quick-Find- Weighted, that updates the smaller set at a cost Ck,n = 
Sk^m leading to cumulated costs 

m 

^n,m ~ / , ^k,n- 
fe=l 

In other contexts where coalescence of two sets occurs, costs of interest are 
Lfe „, the size of one of the two sets chosen randomly with a probability that is 
proportional to its size, i.e. Lk^n — Sk.n with probability Sk,n/iSk.n + Sk,n) and 
Lk,n = Sk,n with probability Sk,n/{Sk,n + Sfc.n), or 

or again 

Dk,n — \UkLk,n\ ■ 

In the next Sections, some interpretations are given for these last costs. Here, 
(C^fc)i<fc<„_i denotes a sequence of independent random variables, uniform on [0, 1]. 
In |15| . using recurrence relations, Knuth and Schonage give the following equiv- 
alents for the total merging costs: 



(3) E 



a 



QF 



- J^n'/' + 0(71 log n). 



E 



a 



QFW 
n,n — 1 



— —nlogn -\- 0[n), 



in the case of the additive Marcus-Lushnikov process (log denotes the natural 
logarithm). In this paper, we study concentration or limit laws for total costs 
Cn.n-i as well as for partial costs C„ rc„i . For the partial costs, we obtain the 
following results: 

Theorem 1.1. For any rj G (0, 1), and any positive e, 



respectively 



in which 



liml 



lim I 



sup 

ie[o,i-i)] 



sup 

>ae[0,l-')l 



c: 
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11. \0L7l\ 



-^^'(a) 



fjQFW 

n. [an] 



r,QFW 



(a) 



>£ =0, 



> 



<y5 



q{k,t) = 



2\l-a 

l°g(T^) 



log 



1 — a 



^^(fcVO q{k,t)q{l,t) dt, 



ken len 
fe-i 



[k{l-e-')Ye 
k\ 



exp(-fc(l-e-*)). 
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This Theorem is actually a corollary of Thcorem l3.il Theorem 13. II is stated and 
proven at Sectional it gives the expression, in terms of the solution q(k,t) of the 
Snioluchowski equation, of the limit function </3'^(a) for the partial costs: 

[ail] 
fe=l 

once Cn^ian] IS normahzed by i. For Theorem 13. II to cover a wide class of costs 
(starting with Quick Find), the general expression c {Sk,n, Sfc,n, Uk,n) for the instan- 
taneous cost of the n-th jump has to involve an extra-randomization parameter, 
Uk,n, uniform on [0, 1]. Theorem 13.11 holds true under the mild condition of poly- 
nomial growth, as a function of Sk.n and Sfc.„, of the instantaneous conditional 
cost 

c{Sk,n, Sk,n) — E [c{Sk,n,Sk,n, Uk^n)\ {Sk,n,Sk,n)] ■ 

For instance, the instantaneous conditional cost for Quick Find is 

'Jk.71 ~r Sk.n 



E [Ak^l {Sk,n,Sk,n)] = 



2 



For QFW and QF, the total costs are respectively O (nlogri) or 8 (n^^^), while 
the partial costs are Q (n) : this is consistent with 

lim.(p'^{a) — -l-oo, 

C^^^^j = o(e [c^Li]) is consistent with ip^^''^ = 

o{(p'^^). Note that, compared with ^^, Theorem 11.11 adds some kind of con- 
centration result for partial costs. We turn now to a more precise study of the total 
costs. 

Detailed analysis of the total cost for QFB and QFW. Let us define 



and also, of course, 



C^m^ = 2^ Rk,. 



fc=i 



An interpretation of Rk,n in terms of the spanning tree model is given in the next 
Sections (QFB stands for Quick-Find-Biased). We have 

Theorem 1.2. 

^ra,rt-l £2 j_ 

n log n 2 

From 0, Rk,n = Sk,n with probability Sk,n/iSk,n + Sfe,n) and Rk^n = Sk,n with 
probability Sk,nl{Sk,n + Sfc,n)- As a consequence Rk,n is more likely equal to the 
smaller block Sk,n than to Sk,m so we expect similar behaviours for C„„_j^ and 

C^n-i- Moreover we expect a smaller variance for C^^-i than for C^n-n but we 
could not produce a proof. However, at the light of Theorem 11.21 we conjecture 
that 



Conjecture 1.3. 

n log n TT 



r<QFW ^ 
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Detailed analysis of the total cost for Quick-Find. Let (e(i))o<t<i denote the nor- 
malized Brownian excursion. For C„ „_2' ^^ have the following result: 

Theorem 1.4. ?i^"^^^ £*„ „_]^ converges in law to J^ e{t)dt. 
Actually, a more precise result is available: for /? > 0, let 

k=l 

hp{t) = e(i) - /3t - inf (e(s) - /3s) , 

0<S<t 

W{I3) - / hp{t)dt. 
Jo 

Then 

Theorem 1.5. (W^n(/3))^>o converges in law to (W^(/3))^>q . 

Theorem 11.41 is the convergence of W„(0). For a detailed study of the family 
(W^(/3))^>o, see ^. Since lim+oo W{P) == 0, Theorem 1131 yields that: 

Corollary 1.6. Assume that y/n = o(ft,„) and /i„ < n. Then 

Remark 1.7. As opposed to Quick-Find, the partial sums for Quick-Find-Biased 
satisfy 

lim(nlogn)-^E[c7«f„^_,^ J ^ \im{n\ogny' e\cZ^, 

for hn = o{n), and the same property holds for Quick-Find- Weighted. These 
quite different behaviours for the partial and total costs of QF and QFW can be 
explained, partly, by the existence of several different regimes of convergence of the 
additive Marcus-Lushnikov process. 

1.3. Regimes of the additive Marcus Lushnikov process. Denote by BJ^ ^ 
the size of the largest cluster after the fc-th jump: interpretations based on frag- 
mentation of trees |21 1^ or on analysis of hashing algorithms [H] show that the 
additive Marcus-Lushnikov process has three different regimes: 

• the sparse regime: if y/n = o{n — fc), then B'^-^/n ^ in probability ; 

• the transition regime: when n — k — 0{y/n), several clusters of size 0{n) 
coexist, and, once renormalized, clusters' sizes converge to the widths of 
excursions of Brownian-like stochastic processes ; 

• the almost full regime: ii n — k = o{^/n), B^^/n -^ 1 in probability, and 
a unique giant cluster of size n — o{n) coexists with smallest clusters with 
total size o{n). 

Thus, the dramatic increase of i?^' ^ (and, as a consequence, of Ak^n) during the 
transition regime explains the huge contribution of the transition regime to the 
sum C„„_]^, as quantified by Theorem 1 1 . 51 and by Corollarv ll.61 and this in spite 
of the fact that the transition regime involves a relatively small number of terms 
of C^n-i- Rather than -B^ i, the sizes of small clusters have an actual impact on 
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C^n-1 or C^n-iJ since, in most of the jumps, Sk,n is way smaller than Sk,n ; thus 
the quite different behaviour of QF and QFB reveals that, in some sense, the sizes 
of small clusters have a moderate increase during the transition regime, the sparse 
regime providing the largest contribution to C„ „_i or C^^-i- Also, the apparition 
of the Brownian excursion area in Theorems 1 1 . 41 and 1 1 . 51 is typical of a phenomenon 
linked with the transition regime, where the asymptotics of the parking scheme can 
be described in terms of the standard additive coalescent 2, ^ ^ . 

The asymptotic behaviour of the partial costs C„ |q„j is determined by the be- 
haviour of the additive Marcus-Lushnikov process during the sparse regime: once 
suitably normalized, the additive Marcus-Lushnikov process converges to the (de- 
terministic) solution of Smoluchowski equations (cf. ^J 120] or Theorem 13.211 , ex- 
plaining the deterministic nature of the limits tp'^^^a) and (pQFw{oi) in Theorem 

o 

The paper is organized as follows: in Section[21 we describe the embedding of the 
additive Marcus-Lushnikov process in two combinatorial coalescence models, the 
random spanning tree and the parking scheme. Through the first embedding, we can 
rephrase the analysis of Union-Find algorithms in terms of the additive Marcus- 
Lushnikov process. Convergence of Marcus-Lushnikov processes to solutions of 
Smoluchowski equations is used in Sectional to prove Theorem ll.il In Sectional 
we use some combinatorial properties of the parking scheme to bound the mean 
and the variance of Quick-Find-Biased and prove Theorem 11.21 In Sections [S] and 
El we prove Theorems II .41 and II . 51 about the total cost of Quick-Find, with the help 
of the analysis of phase transitions for the parking, as given in [Hj . 

2. Two EMBEDDINGS OF THE ADDITIVE MARCUS-LUSHNIKOV PROCESS 

Marcus-Lushnikov processes are of no use to Knuth, Schonhage or Yao, and 
their analysis of average costs of UNION-FIND algorithms rely quite naturally 
on probabilistic models defined in terms of random spanning trees, or in terms 
of random graphs. Following |221, the next subsection recalls how the additive 

Marcus-Lushnikov process X*^"^ = { X^ ] is embedded in the spanning tree 

V / t>o 

model. As a consequence, the analysis of partial costs for the additive Marcus- 
Lushnikov process, given in Section turns out to be a development of Knuth, 
Schonhage or Yao analysis. The proofs of Sections EHEl rely on the embedding of 
the additive Marcus-Lushnikov process in the parking model, a model often used 
to analyze linear probing in hashing tables [SI El- This last embedding is described 
in a second subsection. 

We start with a description of the additive Marcus-Lushnikov process that helps 

to understand its connections to the spanning tree model and to the parking scheme: 

Ipi 
at step k pick a first cluster P with a probability - — - among the n — k + 1 clusters, 

and let us call it the "predator" (being a size-biased pick it is likely larger than 

the average cluster) ; then pick the "prey" p uniformly among the n— k remaining 

clusters, and let P eat p, producing a unique cluster with size | P| -I- |p| . It is not hard 

to see that this defines the additive Marcus-Lushnikov process, and that Lk,n (resp. 

Rk,n) can be seen as the size of the predator (resp. of the prey). If, alternatively, 

both clusters are size-biased picks (resp. if both are uniform picks), we obtain the 

multiplicative Marcus-Lushnikov process (resp. the Marcus-Lushnikov process with 

constant kernel^ also called Kingman's process). 



MARCUS-LUSHNIKOV PROCESSES AND UNION-FIND ALGORITHMS 7 

2.1. The spanning tree model. Let Tn be the set of unrooted labeled trees with 
n vertices. As noted by Cayley, %i has n"~^ elements. Given a labeled tree T G Tn, 
consider a labelling (or ordering) of its n — 1 edges. Let Tk be the subgraph of T 
whose k edges have labels not larger than k: Tk is a forest with n — k connected 
components. The connected components (trees) of the forest play the role of the 
dynamic sets we mentioned earlier. We have: 

• To is the graph with no edges. It has n size-1 components, that we call 
monomeres, following chemists' terminology. Also, T„_i = T. 

• Tfc is obtained from Tk-i by addition of the edge labelled k in T. 

Following |15| . let us call the sequence (Tfe)o<fe<n-i a spanning tree of T. Now, 
there are (n — 1)! orderings of the n — 1 edges of this tree, and thus the set STn of 
spanning trees has n"^^ x (n — 1)! elements. A random, spanning tree is a random 
uniform element of ST„ . 

Let Yk be the partition of the number n induced by the connected components 
of Tfe. In 22 , Pitman proves that conditionally given {Yi)o<i<ck, the addition of 
the k + 1-th edge will merge two subtrees with respective sizes x and y with a 
probability 

x + y 

n{n — k — 1) 

The same expression is obtained specializing relation ^ to the case K{x,y) = 

a{x + y), when X^ has exactly k clusters. Thus F'"' — {Yi)Q<i<n-i and X*^"^ — 

( Xf" I have the same law, up to a time change: the jumps of F '^"^ take place 
V / t>o 

at times 1, 2, . . . , n, while the jumps of X'") occur at random times ^ (actually the 

time elapsed between the k-th and k+ 1—th jumps of X'"' is random exponentially 

distributed with mean anfa-fc-i) )• ^^ ^^*^ merging costs do not depend on the 

precise times of jumps, but only on the sizes of clusters that merge, this difference 

does not matter: the total and partial costs have the same law in the additive 

Marcus-Lushnikov process and in the spanning tree model. Thus the Yao-Knuth- 

Schonhage problem fits in the more general frame of merging costs for Marcus- 

Lushnikov processes. 

In this context, Rk,n and Lfe.„ have the following interpretation: let any fixed 

vertex be the root, once and for all, so that each edge has a bottom vertex (the 

vertex that is closer to the root) and a top vertex. Erasing the fc-th edge splits 

a subtree of Tk in two connected components (clusters), the ordered sizes of our 

clusters being Sk,n < Sk,m with the notations of Section ll.2l It turns out that 

the size of the cluster at the bottom of the k-th. edge is a size-biased pick among 

{sk,n,Sk.n}- Thus Lk.n (rcsp. Rk.n) Can be seen as the size of the cluster at the 

bottom (resp. at the top) of the fc-th edge, just before the fc-th jump. 

2.2. The parking model. Consider a parking lot of n places on a roundabout, 
on which a set C = {1, 2, . . . , ?i — 1} of n — 1 cars eventually park. Each car c has a 



However an exact identity between the two processes is easily obtained through a standard 
randomization artifice: attach independent exponential random times te with mean 1 to each edge 
e of a random uniform labeled tree T £ Tn, and let the edge e appear at time te ■ Let Tt be the 

(n) 

subgraph of T with edges e such that te < t, and let Y^ ' be the partition of n induced by the 



K(x,y) = {x + y)/n 



connected components of Tt. Then y(") = ( Yj ) is a Marcus-Lushnikov process with kernel 
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Figure 1. A sample of tries t{c) and the resulting 3 clusters. 
Here n= 10 = 4 + 4 + 2. 

clock that rings at a time T^, and when the clock rings, the car c tries to park on 
a random place t(c). If the first try t(c) is on an empty place, the car parks there; 
otherwise, the car tries the next places clockwise, and parks on the first empty 
place it finds. The first tries {t{c))^^^ are assumed independent and uniform on the 
n places, numbered from 1 to n, and times (T'c)^g^ are assumed to be independent 
exponentially distributed, with mean 1. 

In this model, the clusters are the blocks of places already occupied, with the 
following conventions: 

• there are as many blocks as there are empty places, 

• a block contains an empty place and the set of consecutive occupied places 
before (going clockwise) this empty place, 

• the size of the block is the total number of places in it, including the empty 
place, 

• if an empty place follows another empty place, it is considered as a size-1 
block of its own. 

This way, the initial configuration, with n empty places, has n size-1 blocks (i.e. 
is monodisperse), and each time a car parks, two blocks merge, with conservation 
of the mass, as the empty place that disappears and the car that replaces it both 
count for one mass-unit. The final configuration, once the n—l cars are parked, has 
a unique cluster with size n, and a unique empty place, with number V uniformly 
distributed on {1, 2, . . . , n}. 

It turns out that the sizes of blocks form an additive Marcus-Lushnikov process, 
with kernel K{x,y) = (x + y)/n: given that the parking scheme with n places, k 
cars already parked and i = n — k empty places, has two blocks with sizes x > y, 
the probability that these two blocks merge at the next arrival is 

(4) 



n{n — fc — 1) 

Actually, as follows from equiprobability for the n'^ possible configurations, the 
number N^^y of empty places after block x (clockwise) but before block y is random 
uniform on 1, 2, ...,£— 1. If N^^y ^ {l,i — 1}, there is no way the two clusters 
can merge at the next arrival. Given that Nx^y ~ 1 (resp. i ~ 1) the conditional 
probability that the two blocks merge at the next arrival is the probability that the 
next time a clock ring, the first try of the corresponding car will be on one of the 
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X 



(resp. y) places of the largest (resp. smallest) cluster: 



X y 

resp. -, 
n n 

leading to I^J . Another consequence is that the size of the block before (clockwise) 

the place filled by the k-th arrival is a random size-biased choice among {sk,m Sk,n}'- 

Lk^n and Rk,n can be seen as the sizes of blocks before (clockwise) and after the 

place filled by the fc-th arrival, and -Dfc,„ as the displacement of the car between its 

first try and its final place. 

From the parking interpretation, we deduce now some explicit computations for 
the law of the weighted blocks Lk^n and Rk,m that give some light on the asymptotic 
behaviour of Sk.n and Sfc^„. Consider the conditional probability pj^'^ that, in an 
additive Marcus-Lushnikov process with size n, the j'-th predator has size k, before 
the j'-th meal, given that its size after the j'-th meal is m. From now on, we assume 
the Marcus-Lushnikov process to be embedded in a parking scheme. In particular, 
we retain the interpretation of L^.n and Rk.n as the sizes of blocks before and after 
the place filled by the fc-th arrival, so that p^^/l is the probability that, in a parking 
scheme with n places, the block before the place filled (resp. the block created) by 
the j-th arrival has size k (resp. m). It turns out, for combinatorial reasons, that 
Pmk does not depend on j or n. Thus we have, for instance, 

P™? = pS;''"^ = IP(i™-i,™ = k)= P(i?™-i,™ = m - fc), 
and we shall drop the exponent, for seek of brevity. From the asymptotic behaviour 
oi Pm.k, we expect some intuition about the respective values of Lk^n and Rk,n- 



1 f 7Tl ^\,t._i/ . ^^—k—2 



■('::i>'-<"-^)' 



Lemma 2.1. 

Prn^k 

Proof. Recall that the size of a cluster is defined as the number of cars in the block 
plus one. There are (J^Zi) possible choices for the fc — 1 cars in the block after 
V (clockwise) , and k'^~^ possible parking schemes for these cars ; also, there are 
(m — fc)™~'^~^ possible parking schemes for the m — k — 1 cars in the block before 
V, and finally, k possible first tries for the last car if V is to be the last empty 
place. D 

Lemma 12.11 and Stirling's formula yield at once that 
Corollary 2.2. 

(5) Vfc > 1, lim Pm,'m-k == n • 

m — >oo /c! 

The limit distribution is the so-called Borel distribution, tightly related to ex- 
plicit solutions of Smoluchowski equations PP, and to the tree function or Lambert's 
function |14) . Thus, in distribution, Rm-i.m = C* (1) in some sense. However, 
note that the Borel distribution has infinite mean, in coherence with the fact that 
E [Rm-i,m] = (v^)- We shall retain that, provided Lk^n + Rk,n is large, Rk,n 
or Sk,n are negligible, compared with Lk,n- As a consequence, Sk,n or Lfc,™ should 
have quite similar behaviours. This is a first tentative explanation of the drastic 
difference between QF and QFW, revealed by Knuth & Schonhage' results. 
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Remark 2.3. The convergence of the Marcus-Lushnikov process to the solution of 
the Smoluchowski equation, derived by analytic arguments in |2()j . is quite natural 
for the additive case at the light of the following computations. The probability 
p{an) that, after the an-th arrival, the first car to be parked belongs to a size-fc 
cluster, is 

("")/c'=-2(n-fc)""-'=(n-an-l)7i , , fcfc-2 

^^^ ^ ^ - (1 - a)a'=-2 -f e-"^ 

As the size-fc clusters provide a fraction "^^ '*^ of the total size, they also provide 
a fraction ~ n(t) ' °^ ^^^ total number n{t) of cars arrived at time t, so the 
probability p{t) that, at time t, the first car to be parked belongs to a size-fc cluster 
is precisely ~n(t) ' ■ ^^ shall see later that the an-th arrival takes place at a 
time ta ^ — log(l — a), so that p(— log(l — a)) ~ p{an), or, equivalently: 

(fc-1) n(fc,~l0g(l-a)) _ .2 fc"~^ ^^ak 

a n ^^ ' (fc-2)! ' 

The right hand side turns out to be the expression of ■^— !- q{k, — log(l — a)). 

3. Analysis of partial costs after \an] coalescences 

In this Section we state and prove Theorem 13. II and Theorem 11.11 follows as a 
direct consequence. As opposed to the next Sections, the proofs make no use of 
richer combinatorial structures in which the additive Marcus-Lushnikov process is 
embedded, and they could very likely be generalized to a suitable class of kernels 
K. We assume that the cost incurred at the fcth step is 

in which {Uk.n)keN.neN denote a sequence of independent identically distributed 
random variables uniform on [0, 1]: this covers the case of QFW, in which the cost 
Ak_n can be written 

Ak,n — Sfc,ral(7fc_„<0.5 + "Sfc,™ l(7fc_„>0.5 • 

The size of the prey Lj, „ can be written 

Lk,n — Sk,n'^,. ^ Sfc +S'fc,nl„ ^ Sfe.„ , 

the size of the predator and the displacement have similar descriptions. We suppose 
that there exist A> and p, g G N such that: 

Mx e N, Vy e N, h{x,y) = / c^{x,y,u)du < Ax^y"^. 

Jo 

We set, for 1 < m < n — 1, 

m 
fe=l 

Then the asymptotic behaviour of C„Jq„] can be described in terms of the instan- 
taneous conditional cost 

c{x,y) = / c{x,y,u)du 
Jo 

= E [c{Sk,„,Sk,n,Uk,n)\ {Sk,„,Sk,n) = ix,y)] , 
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and of the solution of the Smoluchowski equation with additive kernel (see Subsec- 
tion |0] below) : 



[fc(l 





yv^,w- ^, 


•"^ir 


We have 






Theorem 3.1. 


For any rj > Q, 






sup 

aG[0,l-i)] 


^n, \oLn~\ 

n 


- ^^(«) 



exp(-/c(l-e^*)). 



0, 

in which ip'^ is an increasing function from [0,1) to M+ defined by 

r-log(T^) 



ip^a) 



I '"° ^^c(fc,Z)(z(fc,i)9(/,i) dt. 



feeN ieN 



Thus, ip'^ corresponds to a renormalizcd partial cost until time log I j^ 1 in the 

infinite particle system governed by Smoluchowski equation. In the table below, we 
give the explicit values of Lp'^ for some examples: 



Cost 


c{x,y) 


^%a) 


Quick-Find Ak,n 


x + y 
2 


V ' +loJ ' ]] 


2ll-a+l°Hl-aJJ 


Prey size Lfc,„ 


2xy 
x + y 


log(.!„) 


Predator size Rk,n 


x2 + y2 
x + y 


i 


l-a 


Displacement Dk,n 


x2 + y2 
2{x + y) 


1 


2(1 -«) 



For Quick-Find- Weighted, c{x,y) has the simple form min(x,y), but we could not 
produce an expression more explicit than 

^Q^^(a) = / E E(^ V q{k, t)q{l, t) dt. 

•^" fceN/eN 

Note that a similar expression appears in the analysis of Union-Find algorithms 
under the random graph model (kernel K{x, y) = xy): BoUobas & Simon [5] proved 
that the average cost of QFW is en + 0{n/ logn), in which: 



log 2 



I k^ 

1 + E k-irE 



k + (.-2\ 



fc>i 



£\ (fc + £)'=+^-i 



3.1. The additive Smoluchowski equation. The proof of Theorem 13.11 relies 
on the convergence of the additive Marcus-Lushnikov process to the solution of the 
Smoluchowski equation with additive kernel. Let Al^(N) denote the set of positive 
measures on N with total mass less or equal to 1. A (deterministic) solution /x of 
the additive Smoluchowski equation is a family /i ~ (/it)t>o of measures in A^J'(N) 

Mt = ^q{k,t)5k, 

fcSN 
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that satisfy: 

i) V/jGN, q{k,0) = 5i{k), 



iS){ 



Vfc e N, Vf > 0, 



^^ = 5 E ,ti kqU, t)q{k - J, i) - q{k, t) Y.%, U + k)qij, t). 

The coefficient q{k, t) can be seen as the concentration of particles of size k at time 
t in a given volume unit, for an infinite system of particles. The first term on the 
right hand side of the Smoluchowski equation [S) corresponds to the creation of a 
particle with size k due to coalescence between smaller particles, of size j and k — j, 
at a rate j + (fc — j) = k, and the second term to the destruction of a particle with 
size fc, through coalescence with another particle of size j, at a rate k + j. 
In the additive case, there exists a unique solution to (S), given by: 

(6) Vfc e N, Vt > 0, q{k,t) = iIMl_^_|!:le-*-'=(i--') 

(see Aldous PP). All the moments of this solution can be explicitly computed, and 
for instance: 

Vt > 0, < fit,x >^ I, < /it, 1 >= e"*, < /it, x2 >= e^*. 

The first equality says that the mass is preserved during coalescences, the second 
one says that the concentration (number of particles per unit volume) decreases 
exponentially, and the third one gives the exponential increase of the mean size of 
a tagged (size biased) particle. 

3.2. The infinitesimal generator of the additive Marcus Lushnikov pro- 
cess. An alternative definition of the additive Marcus-Lushnikov process, through 
its infinitesimal generator, is more suitable for our computations. An additive 
Marcus-Lushnikov process (/t")i>o is a continuous time cadlag Markov process 
with values in A^^(N), satisfying the set [MLn) of conditions below: 

i. /ijf =5i, 

ii. V< > 0, /^r e {^ Eti 5..,k& N, \/i X, e N, Eti ^^ = n}, 
iii. its generator L is given by: 

Vt/; : Af+(N) -^ R measurable, V/i = i J2Li '5^.' 

In the last term, for symetry reasons, the additive kernel appears with a factor 1/2. 
It is well known that, for every n, (ML„) has a unique solution (/i")(>o (which 
is a collection of random measures in A^f (N)), satisfying moreover to the mass 
conservation property: 

Vt > 0, < fi",x >= 1 U.S. 

3.3. Convergence of the solution of (MLn) to the solution of (S). We re- 
call here some definitions and theorems of convergence for the additive Marcus- 
Lushnikov process. 

1. On A^f (N), the vague convergence of measures is defined as follows: 

(Ai«)«GN -^ /^ ^ V'f/' e Cc(N,M), </i„,V >^</i,V' >, 
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in which Cc (N, R) denotes the space of functions from N to M with compact support. 
We assume that A1]'"(N) is endowed with the vague topology (which is metrizable). 
Denote by ©([0, r],Al]^(N)) the set of cadlag functions from [0, T] to A^;i^(N), 
endowed with the Skorokhod topology IIUI. 

Denote by (^")i>o the solution of (A/i„) and by {pit)t>o the solution of [S). 
Our analysis makes use of the following convergence theorem (it is a refinement, 
due to ^21 J of a well known result of ,20,); and of some direct consequences listed 
below: 

Theorem 3.2. For every T > 0, 

(Ai")te[o,T] -^ i^^t)te[o,T]■ 

Here we mean convergence in distribution. 

2. As {fJ.t)t>o is deterministic, the convergence in distribution implies the conver- 
gence in probability, that is, if d denotes a metric yielding the Skorokhod topology 

on ©([0,r],X + (N)), we have: 

VT > 0, Ve > 0, P {d [(^Dtelo.T], (Mt)te[o,T]] > e) ^ 0. 

3. Since the limit t t—i- fit is continuous, convergence for the Skorokhod topology 
entails uniform convergence on every [0,T]: for any metric dy yielding the vague 
topology on Af<j^(N), we have 

VT > 0, Ve > 0, P ( sup dy[n^, Mt] > e ) — > 0- 

4. Finally, we have 

Proposition 3.3. For any function Lp from N io M satisfying, for some A> and 
pen, \Lp{k)\ <AkP, 

VT > 0, Ve > 0, P ( sup \< n'^^,ip>- < fit,ip>\>e] — > 0. 

\telo,T] J 

When If is a. function from N to M with compact support, Proposition 13 . 31 follows 
directly from point 3, but for the class of functions with polynomial growth, we 
need some bounds on the moments < ^t, x'' > and E [< //", x^ >]: 

Lemma 3.4. For every p > 2, there exist positive constants Ap and Bp such that 
for every t > 0; 

(7) E[<A^r,a;^>] < e^"*, 

(8) <fiuxP> < Ape^<-P~'^K 

Proof. We derive relation Q using the special form of the infinitesimal generator of 
a Marcus-Lushnikov process (cf. {MLn))- To this aim, some additional notations 
are handy: for a function tp from N^ in R+ and a measure fJ- = ^ Si=i ^xi, let us 
define 

< fj. (g) /x, -0 >=< /i(g)/i,'0> — — / ^/j{x, x)ii{dx). 
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1 v^fc 



When A* = ^ J2i=i ^x^ ' t^^^n 



A„ 1 v^ 



We have 



E 



i#J 



< Ai^ ^^ M^ ((x + vf ~xP- yP) (^^ I > 



ds. 



Since {{x + y)P - xP - yP) (^) < (2P-^-l){xPy + yPx)JoT allx andy [ii[0,+oo), 



E[</iJ',xP>] < l + (2P-i-l) / 

Jo 

< 1 + (2P-1 - 1) /" E [< 

< 1 + (2^ - 2) /" E [< /i^, xP >] ds 

"'0 



<Ai" ® iJL'l,xPy + yPx> 
fi'^(E)fj,",xPy + yPx >]ds 



ds 



the last relation making use of the mass conservation property. Now lO follows 
from Gronwall's Lemma. Similar technics lead to inequality JHJ, the complete proof 
can be found in ,2,. D 

Proof of Proposition \S.'A We consider 

aK,n = P sup \ < fi" - ^it,f l[o,i^) > I > e/3 , 
\te[o.T] J 

Pk = sup I < Mt,V l[K,+oo) > I, 

te[o,T] 
7K,n = P sup I < /^r, V' l[K,+oo) > I > e/3 . 
First, 



E 



sup \< fit,(p l[_R-.+oo) > 

te[o.T] 



< AE 



sup < fit,xP l[K,+oo) > 

te[o,T] 



sup < M",a;^P > 



< AK-PE 

telo.T] 

< AK-PE[<fi'!^,x^P >], 

the last inequality due to the fact that t ^ < ^jl^^x^p > is increasing, as a conse- 
quence of qP + bP < {a + b)P. Thus ((JJ and Markov inequality lead to a uniform 
bound 

lK,n <3AK-Pe'^^^e-\ 
Also, 

f3K<A2pK-Pe'^'P-^^^. 

As a consequence, iiT can be tuned to make sup„ "fK,n arbitrary small, and simulta- 
neously Pk smaller than e/3. Once K chosen, we use lim„ aK,7i = to conclude. D 



MARCUS-LUSHNIKOV PROCESSES AND UNION-FIND ALGORITHMS 15 

5. By a similar proof, for every function ip from N^ to R such that \'4'{k, l)\ < AkPl"^, 
we have 

(9) UmP I sup I < ^^^j^^,-!/; > - < /zt«)^t,'(/' > I > e I = 0, 

" \te[o,T] J 

for any T and e positive. 

3.4. Merging costs as functionals of (AfL„). In this subsection, we prove The- 
orem l3.l1 Let (C/")s>o denote a family of independent and identically distributed 
random variables, uniform on [0, 1] and independent of {fi")t>o- When a coalescence 
occurs at time s (//"_ ^ ^"), we assume that a nonnegative cost c(^"_,/i", C/") is 
incurred, with 

\ n ^-^ n ^-^ n ^ ^ I 

\ i=i i=i / 

if fc e {2, . . . , n}, {xi)i<i<:k E N*^, and u E [0, 1], and with c (/i, v, u) null otherwise. 
Furthermore, we assume that there exist A > Q and p,q E N such that: 

h{x,y)^ c'^{x,y,u)du< Ax^'y'^, Vx e N, Vy G N. 

Jo 

Then the partial cost up to time t is 



cr= Y. ^if^Uf^",u-). 



Q<s<t 

Recall that c{x,y) — J^ c{x,y,u)du. According to [2S1 Ch. IV, Lemma (21.13)], 
we have 

— = f <^i:^Sn:,c{x,y)^^>ds + M- 
n Jq 2 

* X + y 1 /"* 
< ^" (8)^",c(x,j/) — - — > ds / < fi'^,xc(x,x) > ds + Mr, 

2 n Jo 

in which Af " is a martingale such that 

t 



< M" >t^l f < m: % ^^:, Kx, y)^ > ds. 
Set 



n./o '^^ " ^'" ^ '^' 2 



Ct = < ^s(g)/is,c(a;,y)— — > ds 



c{x,y)dns{x)d^is{y)ds. 



10 

As a consequence of the convergence of the solution (/x")t>o of (Af L„) to the solu- 
tion {fit)t>a of (S*), we get: 

Theorem 3.5. For every cost c such that there exist yl > and p,q E N with 
\/x E N, yy E N, h(x,y) — L (?{x,y,u)du < Ax^y"^, we have, for each positive T 
and £, 



limP sup 
" \te[o,T] 



n 



> £ = 0. 
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Proof. First we bound the martingale and the diagonal term. By Doob's inequality, 
we obtain 



sup IM^I 
*e[o,T] 



< 4E[<M">t] 



(10) E 

but Lemma [3 . 41 yields that 

E [< M" >t] < ^ I ^[< M? ® m:, xPy\^ + y) >] ds 

'^n Jo 

2n Jo V 
that vanishes as n grows to infinity. For the diagonal term 



Dl' 



1 



< fj,",xc{x,x) > ds, 



observe that 



Dt<^^ </ir,2:P+'+' >ds, 
n Jo 



and that t -^ D" is increasing. Thus it is enough to control the terminal value: 



E 



sup Dj" 

te[o,T] 



< }^ ^E[<^/:,xP+'^+'>]ds 



(11) 



< 



^Te^J'+'+i^ 



that vanishes as n grows to infinity. Then, with the help of (jSJ, we bound the 
integral terms: for any positive T and e, we have 



liml 



sup 

,te[o,T] 



Finally, as usual. 



sup 

.te[o,T] 



< 



n 


Ct 




/■* 


sup 

te[o,T] 


L 



< fi" ® fi" - fj.s <» fis,c{x, y) > ds 



> e 



< m" «> ^r - Ms «) ^is,c{x, y)—T^ > ds 



> e 



0. 



>e/3 



+P sup |Mt"| > e/3 + P sup Dl' > e/3 , 
\te[o,T] / \tela.T] J 

and the three terms on the right hand side vanish, the first one by step 2, the second 
(resp. third) term, by pO(l (resp. pi|l ^ and by Markov inequality. D 

Proof of Theorem Vd . 1\ For analysis of algorithms or combinatorics, the fact that 
Marcus-Lushnikov processes are continuous-time processes looks like an artefact: 
this artefact will prove useful if we can convert Theorem 13.51 a result about the 
cumulated cost at a deterministic time, into a result about the cumulated cost after 
a deterministic number of jumps. Thus we have to establish a close connection 
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between the cumulated cost C" up to time t, defined at the previous section, and 
the cumulated costs Cn,m or C„Jq,„] involved in Theorem l3.il For a e [0, 1), set: 

T" = inf 1 1 > 0, < fil\ 1 > < 1-a- - 

T^ is the time when the [Q;n]-th coalescence occurs, when the total number of 
clusters becomes smaller than (1 — a)n — 1. Thus 

(12) </i?.„,l>~ 1-a, 
and 

(13) Crpn — Cn,[an]- 

As a consequence of Proposition |^31 for s-ny positive T and e, we have 
limP I sup I < Ait", 1 > - < Aii, 1 > I > e ) = 0. 

" \t6[0,T] J 

Since < /it, 1 >= e^*, relation H12f) leads to e^^" ~ 1 — a, and the following Lemma 
is not unexpected: 

Lemma 3.6. For any positive e and rj, 

limpf sup |r^ + log(l-a)| >£ I ==0. 
" \ae[o.i-'?] / 

Proof. Assume that for some a G [0, 1 — 77], we have: 

T;' + log(l-a)>£, 

or 

T^' + log(l-a)<-£. 

The first inequality insures that for any time to < — log(l — a) + s < e — logry, 
< /i"g, 1 > is larger than 1 — a, and if for instance we choose to > — log (1 — a)+e/2, 
we obtain 

|< ^^0. 1 > - < f^to, 1 >| > 7?(1 - e--^/^). 
The second inequality insures that at time 

ii = -log(l-a)-£ > 0, 
we have < /i"^ , 1 > < 1 — a, and as a consequence 

\< Ht^,l >- <Mti,l >| > ?/(l - e-^/2). 
Then we use Proposition 13. 31 with T = e — log 77. D 

Finally, we combine relation (|13|1 , Theorem 13.51 and Lemma 13.61 to deduce the 
proof of Theorem 13. II Recall that 



cp{a) = a 



Mrh,)- 
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Given any positive numbers (3, e and 77, we can write: 



sup 

a<l — r? 



c;? 



c, 



l°g(T^) 



> e 



< 



sup |T;' + log(l-a)|>/3 

a<l — ?7 



sup 

a<l — 77 

sup 

a < 1 -— ?7 



c? 



Ct 



>e/2, sup |T^ + log(l-a)|</3 

a<l — ?7 



Cti^ — C, 



log(T^) 



>e/2, sup |r„" + log(l-a)|</? 

a<l— 7/ y 



< 



sup |T;' + log(l-a)|>/3 



sup 

.t<f3~logrj 



CI 
n 



Ct 



>e/2 



+ l{sup{|Ct-C,| I s,te[0,/3-log7,], |t-s|</3}>e/2}- 

For /3 small enough the third term of the last sum vanishes, by the uniform conti- 
nuity oit 1-^ Ct- Theorem 13 . 51 and Lemma lT^ take care of the two other terms. D 

4. Analysis of the total cost of Quick-Find-Biased 

4.1. Average case analysis. In this subsection, as a first step for the proof of 
Theorem 11.21 we prove the convergence of the first moment of i?„/n log n, using 
the parking representation. In the next subsection, a bound for the variance of 
Rn/nlogn completes the proof of Theorem II. 21 We have: 



Lemma 4.1. 



E 



lim 



a 



QFB 



n log n 2 

The next Lemma is of constant use in the rest of the paper: 

n - Lk,n 



Lemma 4.2. For any /c G {1, . 



1}, E[i?fc,„|Lfc,„] = 



n — k 

Proof. As in Sect ion IT^ we assume the Marcus-Lushnikov process to be embedded 
in a parking scheme. Let us number the blocks clockwise from to n — /c, starting 
with the block before the place filled by the fc-th arrival, and let fii denote the size 
of the i-th block (so that (/3o, /3i) = [Lj^^n: Rk,n))- It is easy to see that among the 
n^ parking configurations, there are 

(14) f. .. V\ l)"'°^'^-' 



60 - 1,61 - 1,. 



, bn-k 



configurations such that (/3i)o<i<n-A; ~ (^*)o<i<n-fc- ^^ ^ consequence, the family 
iPi) i<i<n-k ^^ exchangeable, while /?o, being a size-biased pick among the n — k + 1 
blocks, tends to be larger. With the additional fact that 



n~k 

E 

i=0 



this leads to 



E [|3^ \I3q 



Pi = n, 

_ n- Po 
n — k ' 
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for any i > 1, and specially for /3i — Rk,, 



D 



Proof of Lemma \4.1\ We find different bounds for E [i?A;,n] according to the three 
different regimes of the additive Marcus-Lushnikov process. For e positive but 
smaller than 1/2, set (p{n) = n — n2+^ and tp{n) — n~ 712^^. Also, let B'^-y > 
^fe 2 ^ ■ • ■ denote the sequence of sizes of blocks (clusters) after the fc-th arrival 
(jump), in decreasing order: 

The sparse regime. For k < (p{n), the largest cluster is small, and, as a consequence, 

n~E [Lk^n] n 



E[Rk 



k n — k' 



n — k 



or, more precisely. 

Lemma 4.3. lim sup -^1 E \Rk n] 

Proof By Lemma [4.21 
(15) 



1 < fc < ip{n) } = 



1 _ !LJi e [r,,,] =. E 



but, for 1 < fc < (fiin), E [Lk,n/n] < E B"^,-. Jn and, as a consequence of |S1 
Theorem l.ll, 



B. 



ip{n),\ P 



0. 



n 



Convergence of expectations follows, as S". s ^/n is bounded by 1. 
As a consequence, the contribution of this regime is 

ip(n) ip(n) n-1 / _, 

fc=i 



D 



(16) 



'^-^ n ~ k ^-^ fc V 2 

fc=i K „i+e ^ 



e I nlogn. 



fc=n7^ 



The transition regime. If fc ~ ^/n, B^ ^ — 8(ri), so that the terms of the sum i?„ 
corresponding to the transition regime can be large. However there are few such 
terms: 

V'(n) V'(") 

(17) Yl E[^fc,«]< J2 -^-^enlogn. 

fc— (p(n) A;— i^(n) 

The almost full regime. If A; > V'(n), again as a consequence of Theorem 1.1], 



(18) 



B 



!/>(«), 1 P 



1. 



Thus, as Lk^n is the size of a size-biased pick among the blocks, we expect that 
P (Lfe,„ ^ Bl^ = o(l), hh^^l, and E [Rk,n] = o ^ " 



n — k 



More precisely, we have 

{ n — k 
Lemma 4.4. lim sup < E [Rk.-. 



i'{n) <k<n-l\^Q 
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Proof. Since Lk^n is the size of a size-biased pick among the blocks, we should have 



Bh 



thus 



E 



L, 



> E 



Lh_. 



4ifc..=B?i} 



> 



^M 



> 



B' 



V'("),i 



Now, relations H15(l and (|18|l yields the desired result. 
Thus 



D 



(19) 



J2 E[Rk.n]=o\ J2 



-k 



a (n log n) 



Lemma WJ\ follows, as ()16(l . (I17II and (|19|l hold true for any e positive and small 
enough. 



D 



Remark 4.5. Note that, using 



E[i?„] = E[Rn-l,n]+J2 



Pn,k I Pn,n — k . 



K_fc]+E[i?fe]), 



fe=i 



and 



Pn.k ' Pn,n — k 



1 ^fc /^fc^'' Vri-/c 
2(n-l) " ^n 



i-fc-i 



we recover [151 Relation (10.1)]. This lead Knuth and Schonhage JS] to an alter- 
native proof of Lemma 14.11 one sees easily that 

E[i?„-i,n]=a\/^+0(l), 

in which a = ■\/7r/2, but JJ, Relation (12.7)] ensures that, as a consequence. 



E 



c: 



QFB 



/27r 



nlogn + 0{n). 



However, through this type of arguments, we were not able to obtain a suitable 
bound for the variance. 

4.2. Analysis of variance. The next Proposition completes the proof of Theorem 

o 

Proposition 4.6. Var (c^^^^-^ = o((nlogn)2). 

Once again, we use the exchangeability property of blocks' sizes in the parking 
scheme: 



Lemma 4.7. For 1 <l < k <n- l,E [Ri^nRk.n] 



[Ri^njn- Lk,„)] 
n — k 



Proof. Consider the n — k + 1 blocks (clusters) before the fc-th jump. Let us 
number them clockwise from to n — fc, starting with the block that contains the 
place filled by the l-th arrival, and let ji denote the size of the i-th block. Let CIq 
denote the random set of cars belonging to block 0, and let J^ denote the a-algebra 
generated by Clo and (^(c))^^^';^. Also, let G be the cr-algebra generated by !F 
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and (7i)i<i<ri-fc- ^^ ^^ ^^^^y ^^ ^'^^ that, among the {n ~ go)'' S" ^{n — fc) possible 
parking configurations (given CIq and (i(c))^gp; ), there are 



k- go 
.51 - 1,52- l,...,5n-fe 



\ n—k 

1 n. 



,5.-2 



configurations such that (7i)]^<j<„_;. = i9i)i<i<n-k- ^^ ^ consequence, condition- 
ally, given JF, the family (7i)i<i<n-fe is exchangeable, while 70 is jF-measurable, 
and, being, in a sense, a size-biased pick among the n ~ k + I blocks, tends to be 
larger. Note that 70 + ... + 7n_fc = n. 

Given Q, the conditional probability that the fc-th arrival fills the empty place 
at the end of block i is ji/n, entailing that 

-. n—k -. n—k 

E [Rk,n\G] = - V 7»7.+i, E [Lk.nl G]^-y] ll 



n 



n 



i=0 1=0 

with the convention that n — k + 1 + i ~ I. As a consequence, 

n—k 



E [Rl^nRkj. 

Now, obviously, the relation 



1 



E 



n 



Rl,n Y^ Jiji+l 



E 



RiAi ^ 7i7j+i 



4=0 



E 



i=0 



1—k 



Rl,n /,7a{i)la(i+l) 



1=0 



holds when a is any power of the cyclic permutation (0, 1, 2, . . . , n — fc), but, due 
to the exchangeability of the sequence (7i)i<i<n-fc+i, conditionally given J^, it also 
holds when a is any permutation of the set {0, 1, 2, . . . , n — fc} leaving invariant. 
Thus, it holds for any a, and, if ©at is the set of permutations on N elements: 



E [RlnRk, 



1 



(n — fc + ly.n 
1 



E ^ 



%—k 



Rl,n y^7<T(i)7a(»+l) 



(n — k)n 
1 



E 



n{n — fc) 
1 



-E 



n—k 

1=0 j^i 
/ n—k 

Rl,n W-Y.^, 



i=0 



n — k 
completing the proof of the Lemma, 



\ i=0 

E[i?,,„E[n-Lfe,„|g]], 



D 



Also, using the exchangeability property for the sequence (/3i)i<i<n-/c, as in 
Section ^21 we obtain: 



Lemma 4.8. For 1 < fc < n - 1, E [i?^ „|Lfc^„] < 



(n- Lk,nf 
n — k 
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Proof of Proposition \J~^ As in Section l4. II we decompose the variance according 
to the three distinct regimes of the parking scheme: 



Var (i?„) < ^ E [Rl„] + 2 ( ^ Gov (ii'j,„, Rk, 

,l<l<k<ip{n) 



k=l 



+2 Y. ^ [RLnRk^n] +2 Y. ^ {Rl,nRk,n] 



l<l<k. 



l<l<k. 
^i/i(7i)<fc<n 



The square terms. By Lemma [4.81 forl<fc<n— 1,E [i?^ „] < r, so that 



(20) 



Y^[RlJ^O{nHogn) 



fc=i 



Covariances, the sparse regime. Thanks to Lemma 14.71 we have: 



Gov {Rl^n,Rk,n) < K[RLn] 



- E [Rk,n] 



This last inequahty, combined with Lemma 14.31 entails that 

{n - k)Gov {Ri^n, Rk,n) 



lim sup 

" l<l<k<tp{n) 



so that 



nE [i?z,„] 



v(n) 



0, 



ip{n) 



J2 Cov(i?,,„,i?fe,„) ^ o\J2^[Rl,n]Y. 



l<l<k<ip[n) 



1 = 1 k=l+l 



n — k 






n — I ^—^ n — k 

1=1 k=l + l 

(21) = o((nlogn)2). 

Covariances when k belongs to the transition regime. Thanks to Lemma 14.71 

V E[Ri,nRk,n] < V E[RLn] V ^^ 

l<l<k,^(n)<k<\p{n) ^<l<i>{ri.) ip{n)<k<^{n) 

< 2enlogn ^ E [i?,,„] 

l</<VHn) 

(22) < 2e{nlogny. 

Covariances when k belongs to the almost full regime. Note that 7 = (7i)o<i<n-fc 
is the family of sizes of blocks before the fc-th arrival, numbered clockwise starting 
at some point that depends on the ^-th jump, while /3 = {Pi)o<i<n~k is the same 
family, numbered clockwise starting at some point that depends on the fc-th jump: 
from the proof of Lemma [4. 71 we deduce that, for any I < k: 



fc-i 



/ ^E [Rl^nRk 



1 



r/c-i 



1— fc 



n{n — k) 



E 



YRl.nW-T.P" 



.1=1 



i=0 
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From expression l|14|l . we see that, conditionally, given that (3 — {bi)o<i<n-k, the 

cost X]i=i ^i,n is the sum of n — fc + 1 random variables distributed as {Rbi)Q<i<n~k, 
and, incidentally, independent. As a consequence of Lemma 14.11 there exists a 
universal constant A such that 



E 



■fc-i 



E^'."l^ 



< AE 



^ A log A 



< ^nlogn. 



Thus, for k > ipin), 



fe-i 



J2^[RWnRk,n] < 



^ log 71 



1=1 



< 



< 



n - 


k 


ILIj 


A\o 


gn 


E 


n - 


k 


An^ 


log 


n 



n—k 
i=0 

n — max (3^ 
1- 



B 



j/;(n)4 



Finally 
(23) 



E 

l<l<k,^{n)<k<n 



E [RunRk 



< 



o (n^ log 7i) 



^-^ n — k 

^(n)</e<n 



Again, since (|20|l . (|21() . H22|) and H23() hold true for any e positive and small 
enough, this completes the proof of Proposition l4.6l D 

Remark 4.9. While the asymptotic behaviour of the partial costs was obtained 
by merely analytic tools, our analysis of the complete costs relies on the addi- 
tional information captured by some underlying combinatorial structure, the park- 
ing scheme, and can hardly be extended to other kernels. 



5. ASYMPTOTICS OF THE COST OF QuiCK FiND 

This Section is devoted to the proof of Theorem II. 41 We need some notations. 
First, as the cost A^^n of the k~th union of a Quick Find algorithm is a random 
uniform pick among the sizes of the two clusters involved, we may write 

Ak,n — £kLk,n + (1 ^ £k)Rk,n, 

in which {sk)i<k<n-i is a sequence of i.i.d. random variables with law i(5o -I- \5i, 
independent of the parking scheme. Also, let c(fc) denote the car involved in the 
fc-th jump, that is, such that 

# {c |l < c < n - 1 and Tc < T,(fc) } = fc, 

let the first try of c(fc), t{c{k)), be denoted t{k) for sake of brevity, and let f{k) 
be the final place of of c(fc). Let H (resp. Hk) be the cr-algebra generated by 
{t{c),Tc)^^c ('■'^SP- by (*(*))i<j<fc_i and f{k)). Finally, set 

Fk,n = -z{Lk,n + Rk,n), 



F„ 



E^. 



fc=i 



2 

n-l 

E^ 



k,n 1 



D„ 



n-l 

E- 

k=l 
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The proof is based on the foUowing observations: clearly 



(24) 



IE [^fc^nl'H] = -^{Lk.n + Rk,n), 



and, since, conditionally given Lfe^„, the displacement Dk^n is uniformly distributed 
on {1, ..., ifcjw}, we have 



(25) 



'[Dk,n\nk]^^{Lk,n + l). 



We also need an important result about hashing with linear probing |7l llll[T^ : 

)98). 

e{t)dt. 



Theorem 5.1 (Flajolet, Poblete and Viola, 1998) 



Due to relation H25|l. we have 
Lemma 5.2. ||2Ai - ^nlla = o (n^^^' 
Proof. Expanding {2D„ — L„ — n + 1)^, we obtain: 



\2D„- Ln~n+ l\\l^Ei+E2, 



in which 



71 

Y, E [i2Dk,n - Lk^n - 1) 



A,-l 

S2 = 2 ^ E [(2A,„ - iz.n - 1) (2Z?j,„ - i,,„ - 1) 

l<i<j<n-l 

Owing to (|25|l . for i < j, 

E [E [(2A,„ - L,,n - 1) {2Dj,n - L,,„ - 1) | Hj ]] - 0, 
and S2 vanishes. By definition of Dk^m we also have 





E 


i2Dk,n - Lk,n - if 


Lk,n 


= 


^ i(iL-i 


Thus 






(26) 


2i 


n-1 

< i> 'E Ll,' < 

k=i 


n3 r 
'3' Jo 


1 

E 


\ n J _ 



da. 



According to |2S1, for < a < 1, (Sp^ ^ ]^/ri)neN converges in probability to 0, thus 



limE 



L 



\an~\ , 



= 



and Lebesgue Dominated Convergence Theorem completes the proof. 

As a consequence of Lemma [4. II and Proposition l4.6l 
Lemma 5.3. ||2F„ - LnW^ = ll^nllz = o (n^^^^ 

Finally, 



D 



Lemma 5.4. 



-'^^n '^n.71-1 



V') 
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Proof. We split 



25 



in three terms: 



1 






1 



Rk, 



E 



E 



1 



\k=l 



Ef^-^fci^fc. 



vfc=i 



■^3 



2Y,^ 



.-\)a-s,]L....R,,. 



Since y£k — 2)i<t< _i ^"^^ i-i-d- random variables with mean 0, independent of 7i, 
we find, conditioning to 7i, that: 

-. 71—1 

n-l 



k=l 
-. n— 1 

S3 = —-2_^^[Lk,nRk,n] 



fc=l 



We conclude using the same arguments as in the proof of (|26|l . since we have 

2 - "-1 

2 



K-C«^ 



n,n— 1 



^ 71—1 

-Y,^[{Lk,n-Rk.S' 



fe=l 

< — / E 



^ [an] ,'n. ' -^lan] ,n 



da. 



O 

Finally Theorem II. 41 is obtained by combining these Lemmas with T, Theorem 
4.1]: 

Theorem 5.5. Let {Xn)neN, (^n)neN cind X be random variables such that for 
every n, X„ and Yn are defined on the same probability space. If {Xn)neN converges 
in law to X and if (\Xn — Yn\)neN converge in probability to then {Yn)neN converges 
in law to X. 

6. Almost full regime: Proof of Theorem 11.51 

Here we list the slight adaptations to be made to the previous proof, in order to 
obtain Theorem 1 1.51 We introduce 

fc=i 
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and we observe that by the same proof as in the previous Section, but considering 
partial sums rather than the complete sums, we obtain 

(27) p„(/3)~W„(/3)|l^=o(l). 

On the other hand, as a direct consequence of [H] (see specially Theorem 4.1]), 
we know that it is possible to build, on a suitably chosen probability space fl, a 
version of the normalized Brownian excursion, and also a version of the parking 
scheme for each possible size m, in such a way that, if y/m ipm i(3,t) denotes the 
number of cars that tried to park, successfully or not, on place [trnj, among the 
[m — j3^Jrn\ cars already arrived, then we have: 

PrfvA, V™(Ai) ""^"'^ h,{t)\^l^ 

Y on Aa J 

in which Aa = [0,A] x [0,1]. 

Since ij^m captures the whole story of the parking process (for instance, it captures 
the sizes and positions of blocks and the first tries of successive cars), '4>m also 
describes the sample paths of the additive Marcus-Lushnikov processes with size m. 
Specifically, the total and partial displacements have the following simple expression 
in terms of ^m'- 

Jo 
From this relation, we obtain directly that 

Pr fvA, D^iP) ""^"'^ WiP)] ^ 1, 

which, together with (|27(l . entails the convergence of finite-dimensional distribu- 
tions of the positive decreasing processes Wn{-) to the finite-dimensional distribu- 
tions of W{-). This is enough to insure the weak convergence of these processes, 
seen as random variables with values in the space of tail distributions of positive 
measures on [0, -l-oo], endowed with the topology of weak convergence of the cor- 
responding positive measures. These spaces are Lusin spaces, thus, according to 
the Skorohod representation theorem |[23 11.86. 1], one can find a probability space 
where the weak convergence of W„(-) to W{-) is almost sure and since /3 -^ W{l3) 
is almost surely continuous, it entails that Wn{-) converges to W{-) uniformly on 
[0, +oo], almost surely on the probability space f2. 

7. Concluding remarks 

Knuth and Schonhage gave asymptotics for the expectation of some additive 
functionals of the additive Marcus-Lushnikov process, and we were able to give a 
more precise information, either the asymptotic behaviour of the distribution, or 
a concentration result, for these functionals, by embedding the additive Marcus- 
Lushnikov process in a richer structure. It would be interesting to extend such 
results to Marcus-Lushnikov processes with a general kernel K{x,y), but general 
theorems of convergence of Marcus-Lushnikov processes seem not precise enough, 
at least for the total costs, to allow such a generalisation right now. For the total 
costs, our approach is quite specific of the additive case, and even in the important 
case K{x, y) = xy it seems rather hard to improve the results of BoUobas & Simon 
[2], who show that the average cost of QFW is en + 0{n/ logn), c = 2.0847 • • • , 
while the average of QF is n^/8 -I- O (n(logn)^). 
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